



 |
Charlie Calvert's C++ Builder Unleashed
- 12 -
Understanding Relational
Databases
In order to make sure everyone is following the discussion in the next few chapters,
I'm going to spend a few pages giving a quick-and-dirty introduction to relational
databases. This discussion will also include a brief overview of the Database Desktop.
My purpose here is to give a relatively concise explanation of what it means to
use a relational, as opposed to a flat-file, database. Naturally, this will be a
very broad overview of a complex and highly detailed subject. I am not attempting
an academic analysis of this field of study, but instead want to provide a practical
guide for everyday use.
In this chapter, I will be working with Paradox tables and InterBase tables. Each
database has its own unique set of rules. There is no definitive example of a relational
database, any more than there is a definitive operating system or a definitive compiler.
All databases have things in common, just as all compilers and all operating systems
have things in common. As much as possible, I try to stress these common traits throughout
this chapter. However, the specific implementation that I am referencing here is
for Paradox and InterBase databases, and not everything I say will apply to Oracle
or dBASE tables.
In particular, this chapter is about the following:
- Indices
- Primary keys
- Foreign keys
- Referential integrity
If you already understand these subjects, you probably won't have much use for
this chapter. If you need to review these subjects, or need to be introduced to them,
you should read this chapter.
Getting Started with Relational
Databases
There are many different kinds of possible databases, but in today's world, there
are only two kinds that have any significant market share for the PC:
- 1. Flat-file databases
2. Relational databases
NOTE: Emerging in recent years has a been
a new system called object-oriented databases. These databases represent an interesting
form of technology, but I will omit discussion of them here because they have a small
user base at this time.
The subject of object-oriented databases will come up again briefly in the chapters
on OOP called "Inheritance," "Encapsulation," and "Polymorphism."
In those chapters you will see that OOP has some powerful features that it can bring
to the database world.
Flat-file databases consist of a single file. The classic example would be an
address book that contains a single table with six fields in it: Name, Address, City,
State, Zip, and Phone. If that is your entire database, what you have is a flat-file
database. In a flat-file database, the words table and database are synonymous.
In general, relational databases consist of a series of tables related to each
other by one or more fields in each table. In Chapter 9, "Using TTable
and TDataSet," and Chapter 10, "SQL and the TQuery Object,"
you saw how to use the TTable and TQuery objects to relate the
Customer and Orders tables together in a one-to-many relationship.
As you recall, the two tables were joined on the CustNo field. The relationship
established between these two tables on the CustNo field is very much at
the heart of all relational databases.
The Address program shown in Chapter 13, "Flat-File, Real-World Databases,"
is an example of a flat-file database. In Chapter 14, "Sessions and Relational
Real-World Databases," you will see a second program, called KDAdd, which is
a relational database.
Here are three key differences between relational and flat-file databases:
- 1. A flat-file database, like the address book example outlined previously,
consists of one single table. That's the whole database. There is nothing more to
say about it. Each table stands alone, isolated in its own little solipsistic world.
2. Relational databases always contain multiple tables. For instance, the
Customer and Orders tables are both part of the BCDEMOS
database. As you will see, there are many other tables in this database, but for
now just concentrate on the Customer and Orders tables.
3. Tables in relational databases are tied together in special fields. These
fields are called primary and foreign keys. They are usually indexes, and they usually
consist of a simple integer value. For instance, the Customer and Orders
tables are related to one another by the CustNo field. The CustNo
field is a primary key in the Customer table, and a foreign key in the Orders
table. There are also indexes on both fields.
NOTE: Indices are about searching and
sorting. Keys, on the other hand, are about relating tables, and particularly about
something called referential integrity.
In practice, these concepts get mixed together in some pretty ugly ways, but the
underlying theory relies on the kind of distinctions I am drawing in this note. For
instance, keys are usually indexed, and so people often talk about keys and indexes
as if they were the same thing. However, they are distinct concepts.
One way to start to draw the distinction is to understand that keys are part of the
theory of relational databases, while indexes are part of the implementation of relational
databases. More on this as the chapter evolves.
Clearly relational databases are radically different from flat-file databases.
Relational databases typically consist of multiple tables, at least some of which
are related together by one or more fields. Flat-file databases, on the other hand,
consist of only one single table, which is not related to any other table.
Advantages of the Relational Database
Model
What advantages do relational databases have over flat-file databases? Well, there
are many strengths to this system; here are a few of the highlights:
- Relational databases enforce something called referential integrity. These constraints
help you enter data in a logical, systematic, and error-free manner.
- Relational databases save disk space. For instance, the Customer table
holds information about customers, including their address, phone, and contact information.
The Orders table holds information about orders, including their date, cost,
and payment method. If you were forced to keep all this information in a single table,
each order would also have to list the customer information, which would mean that
some customers' addresses would be repeated dozens of times in the database. In a
big database, that kind of duplication can easily burn up megabytes of disk space.
It's better to use a relational database because each customer's address would be
entered only once. You could also have two flat-file databases, one holding the customer
information and the other holding the orders information. The problem with this second
scenario is that flat-file databases provide no means of relating the two tables
so that you can easily see which orders belong to which customer.
- Relational databases enable you to create one-to-many relationships. For instance,
you can have one name that is related to multiple addresses. There is no simple way
to capture that kind of relationship in a flat-file database. In the KDAdd program,
you will see that it is possible to easily relate multiple addresses, phone numbers,
and so on with each name. The flexible structure of relational databases enables
programmers to adopt to these kinds of real-world situations. For many entries in
a database, you will want to keep track of two addresses: one for a person's home,
and the other for his or her work. If someone you know has a summer home or an apartment
in the city, you need to add yet more addresses to the listing. There is no convenient
way to do that in flat-file databases. Relational databases handle this kind of problem
with ease. In the last paragraph I emphasized that this kind of feature saves space;
in this paragraph I'm emphasizing that it allows for a more logical, flexible, and
easy-to-use arrangement of your data.
To summarize, a relational database offers these possibilities:
- You can view the Customer table alone, or you can view the Orders
table alone.
- You can place the two tables in a one-to-many relationship, so that you can see
them side-by-side, but only see the orders relating to the currently highlighted
customer.
- You can perform a join between the two tables, so that you see them as one combined
table, much like the combined table you would be forced to use if you wanted to "join"
the Customer and Orders tables in a single flat-file database.
However, you can decide which fields from both tables will be part of the join, leaving
out any you don't want to view. The joined table is also temporary, and does not
take up unnecessary disk space. In short, relational databases can use joins to provide
some of the benefits of flat-file databases, whereas flat-file databases cannot emulate
the virtues of relational databases.
As you can see, the three concepts that stand out when talking about relational
databases are referential integrity, flexibility, and conservation of disk space.
In this case, the word "flexibility" covers a wide range of broad features
that can only be fully appreciated over time.
The one disadvantage that relational databases have when compared to flat-file
databases is that they are more complicated to use. This is not just a minor sticking
point. Neophytes are often completely baffled by relational databases. They don't
have a clue as to what to do with them. Even if you have a relative degree of expertise,
anyone can still become overwhelmed by a relational database that consists of three
dozen tables related to one another in some hundred different ways. (And yes, complexity
on that scale is not uncommon in corporate America!) As you will see later in the
book, almost the only way to work with big systems of that type is through case tools.
Simple Set Logic: The Basis of Relational
Databases
The basis for relational databases is a very simple form of mathematics. Each
table represents a simple set that can be related to other tables through very fundamental
mathematics. Because computers are so good at math, and particularly at integer math,
they find relational databases easy to manipulate.
One common feature of relational databases is that most records will have a unique
number associated with them, and these numbers will be used as the keys that relate
one table to another. This enables you to group tables together using simple mathematical
relationships. In particular, you can group them using simple integer-based set arithmetic.
For instance, in the Customers table from BCDEMOS, there is
a unique CustNo field in each record. Furthermore, the Orders table
has a unique OrderNo field associated with it. The Orders table
also has a CustNo field that will relate it to the Customer table.
The terminology of relational databases expresses these ideas by saying that the
Customer table has a primary key called CustNo, and the Orders
table has a primary key called OrderNo and a foreign key called CustNo:
| Tablename |
Primary key |
Foreign key (secondary index) |
| Customer |
CustNo |
|
| Orders |
OrderNo |
CustNo |
Given this scenario, you can say "Show me the set of all orders such that their
CustNo field is equal to X or within the range of X - Y." Computers
love these kinds of simple mathematical relationships. It's their bread and butter.
In essence, you are just asking for the intersection of two sets: "Show me the
intersection of this record from the Customer table with all the records
from the Orders table." This intersection will consist of one record
from the Customer table with a particular CustNo plus all the records
from the Orders table that have the same CustNo in their foreign
key.
These CustNo, OrderNo, AuthorNo, BookNo, and
similar fields might also be used in flat-file databases as indexes, but they play
a unique role in relational databases because they are the keys used to relate different
tables. They make it possible to reduce the relationship between tables to nothing
more than a simple series of mathematical formulas. These formulas are based on keys
rather than on indexes. It is merely a coincidence that most keys also happen to
be indexed.
Viewing Indices and Keys in DBD
or the Explorer
In the next few sections I define primary and secondary keys, and describe how
to use them. It might be helpful if I preface this discussion with a brief description
of how to view keys using some of the tools that ship with BCB. This is just a preliminary
look at this material. I cover it again in greater depth later in this chapter in
a section called "Exploring the Indices in the BCDEMOS Database."
NOTE: Right now it is not so important
that you understand what primary and foreign keys do, but only that you know how
to view them using the tools that ship with the product. The theory will become clear
as the chapter progresses.
There are two ways to view the indexes and keys on a table. The best way is in
the Database Explorer. Open up the Explorer and view the BCDEMOS database
as shown in Figure 12.1.
Click the Orders table and open up the Referential Constraints branch
as shown in Figure 12.2. Notice that there are two constraints on this table, one
called RefCustInOrders and the second called RefOrders. The RefCustInOrders
field defined CustNo as a foreign key that relates to the CustNo
field in the Customer table.
A second way to view this key is in the Database Desktop. Set the Working Directory
from the File menu to BCDEMOS. Open up the Orders table in the
Database Desktop and select Table | Info structure from the menu. Drop down the Table
Properties and select Referential Integrity, as shown in Figure 12.3.
FIGURE
12.1. Viewing the BCDEMOS database
in the Database Explorer.
FIGURE
12.2. The primary and foreign fields of
the Orders table.
FIGURE
12.3. Selecting Referential Integrity
in the Database Desktop.
Double-click RefCustInOrders to bring up the Referential Integrity dialog
shown in Figure 12.4.
FIGURE
12.4. The CustNo field in the
Orders table relates to the CustNo field in the Customer
table.
The fields in the left side of this dialog belong to the Orders table.
On the right is a list of all the tables in the database. In the center, you can
see that the CustNo field has been selected from the Orders table
and the CustNo field has been selected from the Customer table.
The primary key of the Customer table is related to the foreign key of the
Orders table.
Now go back to the Database Explorer and open up the Indices branch of the Orders
table, as shown in Figure 12.5.
Note that you can see the names of the indexes, here labeled as <primary>
and as CustNo. The fields found in the indexes are also displayed. For instance,
you can see that the primary index consists of the OrderNo field and the
secondary index consists of the CustNo field.
FIGURE
12.5. The primary and CustNo
indexes on the Orders table.
I am showing these to you so that you will begin to see the distinction between
keys and indexes. The two concepts are distinct. For further proof of this, open
up the IBLOCAL database in the Database Explorer. Use SYSDBA as
the user name, and masterkey as the password. Now open up the Employee project
table as shown in Figure 12.6. Note that there are separate listings for the index,
primary key, and foreign keys.
FIGURE
12.6. The Employee_Project table
has three indexes, one primary key, and two foreign keys.
In practice, almost all keyed fields will also have indexes. This leads people
to think the two concepts are the same. However, indexes are about searching and
sorting, and keys are about referential integrity. These distinctions will become
blurred at times, but it helps if you can keep it in your mind that they are different
ideas. The actual details concerning these distinctions will become clear in the
next few pages.
You can also see the indexes for a table inside the Database Desktop. To get started,
open up the Orders table and select Table | Info Structure from the menu.
The fields with the stars beside them are part of the primary index. Drop down the
Table Properties combo box to view the secondary indexes. Double-click the indexes
you see to view the details of their design. If you want to change the structure
of a table, choose Table | Restructure from the menu, rather than Table | Info Structure.
Most of the time, I find the Database Desktop is the right tool to use when I
want to create or modify a table, and the Database Explorer is the right tool to
use when I want to view the structure of a table. However, I often find myself jumping
back and forth between the two tools, to get the best features of each. Later in
the book I will talk about case tools, which are generally superior to either of
the products discussed in this section. However, there are no case tools that ship
with BCB, so I emphasize the universally available tools in this text.
Throughout the ensuing discussion, you might have occasion to use the Database
Explorer to examine the structure of the Customer, Orders, Items,
and Parts tables. These are the tables I use when defining what relational
databases are all about.
Rule Numero Uno: Create a Primary
Key for Each Table!
The last two sections have introduced you to some of the key concepts in relational
databases. If there is one lesson to take out of this chapter, it is the importance
of creating a unique numerical key in the first field of most tables you create.
This field is called a primary key. In both Paradox and InterBase, it is impossible
to create a primary key without also simultaneously creating an index.
If you want to have a list of addresses in a table, don't just list the Address,
City, State, and Zip. Be sure to also include an integer-based
CustNo, AddressNo, or Code field. This field will usually
be both an index and the first field of the database. It is the primary key for your
table, and must be, by definition, unique. That is, each record should have a unique
Code field associated with it.
The primary key
- Serves as the means of differentiating one record from another
- Is also in referential integrity
- Can also help with fast searches and sorts
As I said earlier, the distinction between indexes and keys becomes blurred at
times. However, they are distinct concepts and you should endeavor to discover the
differences.
NOTE: In this discussion I am taking a
liberty in saying that you have to create a primary key for a table in a relational
database. In fact, you can simply create a field that contains a unique integer value.
It doesn't have to be an index. However, making a unique index for the field will
speed up the operation of your database, and it will help enforce rules that make
it easy to create robust relational databases. In particular, the restraints on a
primary key make it impossible for you to create two fields in one table with the
same primary key.
Just to make sure this is clear, I'll go ahead and list out the right and wrong
way to create a table.
Right Method
CustNo: Integer
LastName, FirstName, Address, City, State,
Zip: string
Wrong Method
LastName, FirstName, Address, City, State,
Zip: string
The first example is "correct" because it has a primary index called
CustNo. It is declared as a unique Integer value. The second example
is "wrong" because it omits a simple numerical field as the primary index.
NOTE: I put the words "correct"
and "wrong" in quotes because there really are no hard-and-fast rules in
this discipline. There are occasions when you might not want to create a table that
has a simple integer as a primary index. However, ninety-nine percent of the time,
that's exactly what you want to do.
At the height of a warm May spring day, there is such a thing as a rose bush that
has no buds or flowers. However, the whole point of rose bushes in May is that they
flower. I doubt we would feel quite the same way about roses if they did not have
beautiful blooms. In the same way, relational databases without primary indexes wouldn't
garner quite so much attention as they do now.
I should add that not all primary indexes are numeric fields. For instance, many
tables might use alpha fields containing values such as HDA1320WW35180.
I'm stressing simple numeric fields in this chapter because they are easy to work
with and easy to understand.
Even if you don't yet understand how databases work, for now I would suggest automatically
adding a simple numerical value in a primary index to all your tables. Do so even
if you are not using the field at this time. Believe me, as you come to understand
relational databases, you will see why I recommend doing this in most, though not
all, cases. At this point, however, you will probably be better off creating the
extra field and letting it go to waste, even if you don't understand why you are
doing it. After you get a better feeling for relational databases, you will understand
intuitively when the field is needed, and when you are encountering one of those
rare occasions when it is going to be useless.
When people first work with relational databases, they can get a little hung up
about the overhead involved in creating all these extra key fields. The point to
remember is that these fields allow the database to be treated as nothing more than
sets of simple integers related together in various combinations. Computers fly through
integer math. Adding these extra index fields to your tables makes your data become
computer-friendly. Computers love those simple integer fields; your computer will
show its thanks by running faster if you add them to your tables!
Computers don't feel weighed down by the extra field any more than a car feels
weighed down by a steering wheel, people feel weighed down by their hands, or a rose
bush feels weighed down by a rose. Relational databases want you to add an extra
integer field as a primary index to your tables!
Remember, people like beautiful paintings, eloquent words, and lovely members
of the opposite sex. Computers like logic. They like numbers, they like nice, clean,
easily defined relationships! They like simple, integer-based primary keys in the
first field of a table!
One-to-Many Relationships: The
Data and the Index
One good way to start to understand relational databases is by working with the
Customer, Orders, Items, and Parts tables from
the BCDEMOS database. All four of these tables are related in one-to-many
relationships, each-to-each. That is, the Customer table is related to the
Orders table, the Orders table to the Items table, and
the Items table to the Parts table. (The relationship also works
in the opposite direction, but it may be simpler at first to think of it as going
in only one direction.)
| Master |
Detail |
Connector (primary key and foreign key) |
| Customer |
Orders |
CustNo |
| Orders |
Items |
OrderNo |
| Items |
Parts |
PartNo |
Read the preceding table as a series of rows, starting left and moving to the right,
as if they were sentences. The preceding list shows that the Customer and
Orders tables are related in a one-to-many relationship, with Customer
being the master table and Orders being the detail table. The connector
between them is the CustNo field. That is, they both have a CustNo
field.
The CustNo field is the primary key of the Customer table and
the foreign key of the Orders table. The OrderNo field is the primary
key of the Orders table and a foreign key of the Items table. The
PartNo field is the primary key of the Parts table and a foreign
key of the Items table.
The relationship between these tables can be reversed. For instance, the Parts
table could become the master table and the Items table the detail table,
and so on, back down the line. The reason you can reverse the relationship becomes
clear when you think in purely mathematical terms. The Customer table has
a series of CustNo fields. Say the CustNo for the first record
is 1000. To get the Orders associated with that customer, you ask
this question: "What are all the rows from the Orders table that have
a CustNo of 1000?" That is:
Select * from Orders where CustNo = 1000
Clearly, you could reverse this question. If you select a particular row from
the Orders table, you could find which item from the Customer table
it is related to by asking for the set of all Customer records with a CustNo
of 1000. Because the CustNo field for the Customer table
is a unique index, you will get only one record back. However, the way you relate
the tables is still the same:
Select * from Customer where CustNo = 1000
Working with Primary Keys
The Parts, Orders, Items, and Customer tables
have various keys. As it happens, these keys are also indexes. An index enables you
to sort tables on a particular field. A key helps you define the relationship between
two tables, or otherwise group related bits of information by a set of predefined
and automatically enforced rules.
Unfortunately, sadly, and confusingly, you can still relate tables even without
the presence of any keys or indexes. For instance, if there were no CustNo
primary and foreign keys in the Customer and Orders tables, Paradox
would still let you use SQL to relate the tables in a one-to-many relationship. However,
in this scenario, performance would be slow because there is no index, and there
would be no constraints on the data you could enter in the two tables because there
would be no primary and foreign keys that define referential integrity. In this scenario
you are back to the rosebush-without-a-rose phenomena. Yes, the tables are still
part of a relational database, but they lack the features that make a relational
database appealing. You need both the keys and the indexes to make a relational database
appealing.
I'll draw a distinction between only two different kinds of keys. The first kind
I will discuss is called a primary key. The second is called a foreign key.
- A primary key is a unique value used to identify a record in a table. It is usually
numerical, and it is usually indexed. It can be combined with a foreign key to define
referential integrity. I will talk more about referential integrity later in this
chapter.
- Because it is indexed, the primary key defines the default sort order for the
table. When you first open up a table, it will be automatically sorted on this field.
If a table does not have a primary index, records will appear in the order in which
they were added to the table. For all practical purposes, a table without an index
has no defined order in which records will appear.
- With Paradox tables, each entry in the primary index must be unique. That is,
you can't have two CustNos in the Customer table that are the same.
You can, however, have multiple foreign keys that are not unique.
- It is legal to have multiple fields in the primary index of a Paradox table.
This is called a composite index. These fields must be sequential, starting with
the first field in the table. You can't have the primary index consist of the first,
third, and fourth fields of a table. A composite index with three fields must consist
of the first, second, and third fields. If you have a FirstName and a LastName
field in your database, they can both be part of the primary index. You should, however,
declare the LastName before the FirstName, so that your index will
list people alphabetically by last name: CustNo, LastName, FirstName.
- The primary and foreign keys are never composite. They always consist of one
field.
Creating a primary key enables you to have two people with the same name, but
with different addresses. For instance, you can list a John Doe on Maple Street who
has a CustNo of 25, and a John Doe on Henry Street who has a CustNo
of 2000. The names may be the same, but the database can distinguish them
by their CustNo. Once again, this shows why databases love those simple
integer indexes. If the database had to sort on the address fields every time it
tried to distinguish these two John Does, it would take a long time for the sort
to finish.
Computers can easily distinguish the number 25 from the number 2000,
but it takes them longer to do a string compare on "Maple Street"
and "Henry Street". Furthermore, just comparing the streets wouldn't
be enough; you would also have to compare cities, states, and so on. If two entries
with the same name were both missing addresses, the whole system would be in danger
of falling apart altogether. The same thing would happen if two people named John
Doe lived at the same address. Use those integer indexes; they make your life simpler!
Working with Secondary Indices
and Foreign Keys
It's now time to move on to a consideration of foreign keys. The CustNo
field of the Orders table is a foreign key because it relates the Orders
table to the primary key of the Customer table. It is also a secondary index
which aids in sorting and searching through data. Indices also speed up operations
such as joins and other master-detail relationships.
When writing this section, I have found it difficult to totally divorce the idea
of foreign key and secondary indexes. However, I will try to split them up into two
categories, taking foreign keys first:
- A foreign key provides a means for relating two tables according to a set of
predefined rules called referential integrity.
- In Paradox, you use the Referential Integrity tools from the Database Desktop
to define foreign keys. There is no such thing as a composite foreign key.
- Using SQL you can relate two tables in a one-to-many relationship even if there
is no index or key in either table. However, your performance will be better if you
have indexes. There will be no way to enforce referential integrity if you don't
define foreign and primary keys.
- Using the TTable object, it is impossible to relate two tables in a
one-to-many relationship without indexes. (This is one of the points that doesn't
clearly belong in either the section on keys, or the one on indexes. It relates to
both subjects.)
Here are some facts about secondary indexes:
- A secondary index provides an alternative sort order to the one provided by the
primary key.
- You need to explicitly change the index if you want to switch away from the primary
index to a secondary index. Remember that the default sort order for a Paradox table
is provided by the primary index. If you want to switch from the primary index to
a secondary index, you need to change the IndexName or IndexFieldName
property of your table. If you want to use the primary index, you don't have to do
anything; the table will sort on that field automatically.
- An index that contains more than one field is called a composite index. You can
create composite secondary indexes, which means the indexes will contain multiple
fields. In practice, fields such as FirstName and LastName can
often be part of a secondary index, because your primary index is usually a unique
numerical value. Sometimes a primary index will consist of three fields, such as
the CustNo, FirstName, and LastName fields.
- In Paradox tables all primary and foreign keys must be indexed. You can't define
referential integrity without indexes, and in particular, you must have a primary
key. Furthermore, in InterBase tables, the act of defining a primary or foreign key
will automatically generate an index. (Once again, this is an item that doesn't clearly
belong in either the discussion of keys or of indexes, but rather it relates to both.
As I said earlier, there are times when the distinction between the two subjects
becomes blurred.)
If you are new to databases, you will undoubtedly be frustrated to discover that
different databases have varying rules for setting up indexes, keys, and so on. In
this book, I tend to use Paradox tables as the default, but I also spend considerable
time describing InterBase tables. If you use some other database, such as dBASE,
Oracle, or Sybase, you should be sure to read up on the basic rules for using those
tools. For instance, some databases let you set up a foreign key that is not an index.
In the Paradox and InterBase world, however, foreign keys are always accompanied
by an index, so the two words become synonymous, particularly in the hands of people
who don't really understand how relational databases work.
The good news is that you will find that overall there are certain basic principles
that define how databases work. The details may vary from implementation to implementation,
but the fundamental ideas stay the same.
Keys Are the Keys to the Kingdom!
Let me take this whole paradigm even one step further. When I first looked at
a database, I thought of it as a place to store information. After spending a lot
of time with relational databases, I now think of them primarily as a way to relate
bits of information through keys and indexes.
I know this is putting the cart before the horse, but what really interests me
about databases now is not the fact that they contain information per se, but that
I can query them to retrieve related bits of information. In other words, I'm more
interested in the logic that defines how tables relate to one another than I am in
the information itself.
No one can get excited about a list of addresses or a list of books. The lists
themselves are very boring. What's interesting is the system of keys and indexes
that relate tables together, and the various SQL statements you can use to ask questions
against various sets of tables.
When I picture a table, I see its primary and foreign keys as great big pillars,
and I envision all the rest of the data as a little stone altar that is dwarfed by
the pillars. Like a pagan temple, it's the pillars that you notice first; the altar
is just a small stone structure you might overlook until someone points it out. Of
course the temple is built around the altar, and databases are built around their
data. But in practice it is easy to overlook the data. You care about the pillars,
and you care about the primary and foreign keys. The rest tends to fade into the
background.
Give me a well-designed database with lots of interrelated tables and I can have
fun asking it all sorts of interesting questions. It's not the data per se that is
important, but the way the data is related!
The act of properly relating a set of tables in a database is called, tragically
enough, "normalizing" the data. Where this dreadful term came from I have
no idea, but "normalizing" a database is the fun part of creating a database
application.
Exploring the Keys and Indices
in the BCDEMOS Database
I am now going to look again at the tools that ship with BCB, and show how to
use them to view and create indexes and keys. This examination of the subject will
have greater depth than the quick overview presented earlier in this chapter.
Here is a list of the indexes on the Customers, Orders, Items,
and Parts tables:
| Table name |
Primary indexes |
Secondary indexes |
| Customer |
CustNo |
Company |
| Orders |
OrderNo |
CustNo |
| Items |
OrderNo, ItemNo |
OrderNo, PartNo |
| Parts |
PartNo |
VendorNo, Description |
Notice that the Items table has a composite primary index consisting of
the OrderNo and ItemNo fields. It also has two secondary indexes,
one on the OrderNo field and one on the PartNo field. The Parts
table has two secondary indexes, one on the VenderNo, and one on the Description
field.
If you do not have a pre-made list like this one, you could find this information
in at least four ways:
- The Object Inspector
- The Database Explorer
- The Database Desktop
- By creating a program that leverages the methods of the TSession object.
Such a program will be shown in Chapter 14.
I will explain all these methods and then discuss some possible alternative techniques.
If you drag the Customer table off the Explorer and onto a form, you
will be able to view its Indices in the Object Inspector. If you drop down the IndexName
property editor, you will see that there is one index listed there. This is the secondary
index, called ByCompany. If you select this index, the table will sort on
the Company field.
If you set the IndexName property back to blank, the table will sort
automatically on the primary index, which is the CustNo field. In other
words, BCB never explicitly lists the primary index in the IndexName property
editor. I suppose that the architects of the VCL assumed that all tables have a primary
index, and that if you don't specify a particular index name, you want to sort on
that index. Of course, it is not an error to create a table that has no primary index,
and BCB can still work with that kind of table.
You can also drop down the IndexFieldNames property, which gives you
a list of the fields that are indexed, in this case the CustNo and Company
fields. Here you can see the fields included in the primary index, but they are not
marked as belonging to any particular index.
NOTE: To study an interesting case, drop
down the Items table on a form. Recall that it has a primary index on the
OrderNo and ItemNo fields, and secondary indexes on the OrderNo
and PartNo fields. If you drop down the index field names, you see the following
list:
OrderNo
OrderNo; ItemNo
PartNo
The first item is the ByOrderNo index--the second the primary index--and
the third, the PartNo index.
The IndexName and IndexFieldNames properties give you a handy
way of tracking Indices at design time. They don't, however, give you all
the information you might need, such as exactly what fields make up which parts of
the primary and secondary Indices. In this case, you could probably guess,
but it would still be nice to get a more definitive answer.
If you open up the Database Explorer, expand the BCDEMOS node, the Tables
node, the Customer node, and finally the Indices node, you get
(naturally enough) a list of the Indices on the Customer table!
This is a great feature, and you should use it whenever possible. Figure 12.7 shows
the expanded nodes of the Indices for the Customer table. (The
program kdAddExplore in the Chap14 subdirectory on the CD-ROM that accompanies
this book uses the TSession object to do the same thing in a BCB program.)
While you have the Explorer open, you should also expand the Fields node,
as shown in Figure 12.8. This gives a quick list of all the fields and their types.
Notice that you can drag and drop individual fields onto a form.
A third way to get a look at the structure of a table is through the Database
Desktop (DBD). You can open this program from the Tools menu in C++Builder. Use the
File menu in the DBD to set the Working Directory to the BCDEMOS Alias.
Open up the Customer table and choose the Table | Info Structure menu choice.
Drop down the Table Properties combo box and look up the secondary Indices,
as shown in Figure 12.9. The primary index is designated by the asterisks after the
keyed fields in the Key Roster. In this case, only the CustNo field is starred,
because it is the sole keyed field.
FIGURE
12.7. The Indices of the Customer
table viewed in the Database Explorer.
FIGURE
12.8. The Fields view of the Customer
table from the Database Explorer.
FIGURE
12.9. The Database Desktop struts its
venerable features by displaying the Indices on the Customer table.
NOTE: Over time, the Database Desktop
will probably be replaced entirely by the Explorer. However, there are still some
things that the DBD does better than the Explorer, so both products are shipped with
C++Builder.
Notice the Save As button on the Info Structure dialog. You can use this to save
a table that contains the structure of the Customer table. You can then
print this out on a printer using TQuickReports. Be sure to use a fixed-size
font, not a proportional font:
Field Name Type Size Key
CustNo N *
Company A 30
Addr1 A 30
Addr2 A 30
City A 15
State A 20
Zip A 10
Country A 20
Phone A 15
FAX A 15
TaxRate N
Contact A 20
LastInvoiceDate @
In the example shown here, I have printed out only the first four fields of the
table because of space considerations. (The fields are Field Name, Type,
Size, and Key.) If I then recursively print out the structure of
the table used to house the structure of the Customer table, I get the following
report:
Field Name Type Size Key
Field Name A 25
Type A 1
Size S
Key A 1
_Invariant Field ID S
_Required Value A 1
_Min Value A 255
_Max Value A 255
_Default Value A 255
_Picture Value A 176
_Table Lookup A 255
_Table Lookup Type A 1
This is the same information found in the Data Dictionary, and it should prove
sufficient under most circumstances.
Using the Database Desktop to Create
Indexes
To create a unique primary key in a Paradox table, open up the Database Desktop,
and create a table with the first field declared as an Integer or autoincrement value.
Place a star next to the first field, which tells Paradox to create a primary index
on it, as shown in Figure 12.10.
FIGURE
12.10. Place asterisks next to the first
field or fields of a table to designate the primary index.
To create a secondary index, drop down the table properties list and choose Secondary
Indices. (See Figure 12.11.) Click the Define button. Select the fields from your
table that you want to be part of your index. Click OK. A simple dialog will then
pop up asking you to name the index. I usually give the index a name based on the
fields being indexed. For instance, if I want to create an index on the CustNo
field, I would call the index CustNo, CustNoIndex, or ByCustNo.
If I wanted to create one on a field called Name, I would call the index
Name, NameIndex, or ByName.
FIGURE
12.11. Creating a secondary index in a
Paradox table.
Using the Database Desktop to Create
Primary and Foreign Keys
To create a primary or foreign key on a Paradox table you need to define something
called referential integrity. You cannot define referential integrity without first
defining primary keys on both tables involved. There also must be an index on the
foreign key, but this index will be created automatically for you when you create
the foreign key.
In InterBase, the situation is somewhat different. The act of creating primary
or foreign keys will automatically define indexes. As I said earlier, there are little
variations on the main themes of relational databases, depending on what kind of
database you use.
In the Data subdirectory from the CD that ships with this book you will
find two tables called MasterTable and DetailTable. Figure 12.10
shows how to use the Database desktop to create the MasterTable. These tables
look like this, with the MasterTable listed first and the DetailTable
listed second:
| Field name |
Type |
Size |
Primary index? |
| Code |
+ |
|
* |
| Name |
A |
25 |
|
| Field name |
Type |
Size |
Primary index? |
| Code |
+ |
|
* |
| MasterCode |
I |
|
|
| SubName |
A |
25 |
|
To create referential integrity between these two tables, you should open up the
DetailTable in the Database Desktop. Open the Table | Restructure menu item.
Select Referential Integrity from the Table Properties combo box. Click the Define
button, and set things up so they look like they do in Figure 12.12. Click the OK
button and give this relationship a name, such as RefMasterDetail.
FIGURE
12.12. Defining referential integrity
between the DetailTable and MasterTable.
When you are done, you will have created primary keys and foreign keys on the
MasterTable and DetailTable. The best way to see these keys is
in the Database Explorer. On my system I used the BDE Configuration Utility to create
an alias called CUnleashed that points at the Data subdirectory.
If you open this alias in the Database Explorer and go to MasterTable, you
can see the primary and foreign keys, which Paradox calls Primary and Foreign Fields.
Why Use Referential Integrity?
Referential integrity is one of the most valuable tools in a database programmer's
arsenal. In particular, referential integrity will help guide the user so that they
do not accidentally enter invalid data, or accidentally delete needed records.
To see referential integrity in action, use the Database Desktop to enter two
records in the MasterTable. The first should have the word Day
in the Name field and the second should have the word Month in
the Name field. You do not have to fill in the Code field, because
it is an autoincrement field (+) and will be updated automatically.
| Code |
Name |
| 1 |
Days |
| 2 |
Months |
In the DetailTable, enter in a few names of days of the week or months of
the year in the SubName field. Give the MasterCode field a 1
if you are entering a day, and 2 if you are entering a month.
| Code |
MasterCode |
SubName |
| 1 |
1 |
Monday |
| 2 |
1 |
Tuesday |
| 3 |
2 |
January |
| 4 |
2 |
February |
| 5 |
2 |
March |
With this data in the tables, you could define a one-to-many relationship such that
if you viewed the MasterTable record with Days in the Name
field you would see only the days in the DetailTable, and if you selected
Months, you would see only the month names from the DetailRecord.
Referential integrity will do two things to help make sure that these tables stay
in good shape. It will prevent you from deleting a record in the MasterTable
that has detail records associated with it in the DetailTable. For instance,
if you select the MasterTable, set the Database Desktop in Edit mode and
press Control+Delete, you will not be able to delete a record from the MasterTable.
Referential integrity will prevent you from entering a value in the MasterCode
field of the DetailTable that is not in the primary key of the MasterTable.
For instance, if you tried to enter the number 3 in the DetailTable's
MasterCode field, you would get the error message "Master field
missing". This is because there is no record in the MasterTable
with a Code field of 3. Of course, if you added a record to the
MasterTable with a Code field that had 3 in it, the database
would let you enter the data.
Needless to say, these rules are also enforced inside BCB. In your own programs,
you might want to create exception handlers that would pop up messages that explained
to the user exactly what was wrong, and why they could not perform a particular operation.
Most users would not respond well to an exception that said no more than "Master
field missing!"
That is the end of my explanation of relational databases. In the last few pages
you have learned about primary keys, foreign keys, indexes, referential integrity,
and how all these pieces fit together to help you create robust applications. In
the next few pages I will step you through some simple examples that illustrate these
points.
One-to-Many Relationships: The
Code
Now that you know something about the data in the Customer, Orders,
Items, and Parts tables, it's time to link them together in a single
program called Relate. To get started, begin a new project and add a data
module to it. Place four TTable objects and four TDataSource objects
on the data module, wire each data source to a TTable object, and then wire
each of the TTable objects to one of the four tables mentioned earlier.
You can also rename the TTable and TDataSource objects so that
they correspond with their respective tables, as shown in Figure 12.13.
FIGURE
12.13. The data module for the Relate
project.
Drop four TDBGrid objects on the main form for the project. Use the File
| Include Unit Header menu option to link Form1 to DataModule1.
Wire the grids to the datasources on the datamodule, making sure that each grid has
its DataSource property assigned to a unique object. For instance, link
the first grid to the Customer table, the second to the Orders
table, and so on.
Using the names visible in Figure 12.4, click the OrdersTable component
and set its MasterSource property equal to CustomerSource, that
is, set its MasterSource equal to the TDataSource object that is
linked to the TTable object that hosts the Customer table. Set
the ItemsTable MasterSource property equal to OrdersSource
and the PartsTable MasterSource equal to ItemsSource.
Click the OrdersTable MasterFields property and link up the
Orders and Items tables on the CustNo field, as described
in Chapter 9, "Using TTable and TDataSet." In the same
way, hook up the TblItems to OrdersTable ká[infinity]the OrderNo
field, and PartsTable to ItemsTable on the PartNo field.
If you set all the tables to active and then run the program, the result should look
like what you see in Figure 12.14.
Spend a little time mucking about with this program. Notice, for instance, that
if you change the selected item in the Customer table, the contents of the
grids showing the Orders, Items, and Parts tables will
change. In particular, notice that the CustNo in all the items in the Orders
table is always equal to the CustNo in the currently selected item in the
Customer table. The same thing can be said about the OrderNo field
in the Orders and Items tables, and the PartNo field in
the Items and Parts tables.
In general, selecting one item at any level but the lowest in the hierarchy will
force many detail records to change. That is why these are called one-to-many relationships.
One record in the Orders table points to many records in the Items
and Parts tables.
FIGURE
12.14. The Relate program at runtime.
NOTE: In this particular example, you
might notice that the Parts table is always arranged in a one-to-one relationship
with the Items table. However, if you reverse the order of these tables
and make the Parts table the master, the arrangement will look more like
a proper one-to-many relationship. However, it is not wrong to make either table
the master. The point is simply to arrange the tables so that you get the information
from them that you want to obtain.
This discussion of the Relate program has given you a look at some of
the important features in the Database Explorer and Database Desktop. It has also
given you a quick run-down on some of the key ideas behind the construction of relational
databases. The point here is that C++Builder has lots of built-in tools that help
you construct relational databases. There is more that I want to say about these
topics, even in this rather sketchy overview of a complicated subject. In particular,
I have not yet talked about joins.
Relational Databases and Joins
In the last section, you saw how to relate the Customers, Orders,
Items, and Parts tables in a one-to-many relationship that is sometimes
called a master-detail relationship. In this section, you will again relate all four
tables, but in a different kind of relationship, called a join.
You had a look at joins in the last chapter, "Working with Field Objects."
This time the query that you need to build is a bit longer:
SELECT DISTINCT d.Company, d1.AmountPaid, d2.Qty,
d3.Description, d3.Cost, d3.ListPrice
FROM "Customer.db" d, "Orders.db" d1,
"Items.db" d2, "Parts.db" d3
WHERE (d1.CustNo = d.CustNo)
AND (d2.OrderNo = d1.OrderNo)
AND (d3.PartNo = d2.PartNo)
ORDER BY d.Company, d1.AmountPaid, d2.Qty,
d3.Description, d3.Cost, d3.ListPrice
Though not horrendously complicated, the syntax shown here is still ugly enough
to give some people pause.
The basic principles involved in this kind of statement are simple enough to describe.
All that's happening is that the Customer, Orders, Items,
and Parts tables are being joined together into one large table of the type
you would have to create if you were trying to track all this information in a single
flat-file database. The one proviso, of course, is that not all the fields from the
four tables are being used. In fact, the only ones mentioned are
d.Company, d1.AmountPaid, d2.Qty,
d3.Description, d3.Cost, d3.ListPrice
Here the d, d1, d2, and d3 are described in
the following From clause:
"Customer.db" d, "Orders.db" d1,
"Items.db" d2, "Parts.db" d3
The Order By clause, of course, simply defines the sort order to be used
on the table created by this join. I am guilty here of using meaningless variable
names. In general, you should choose identifiers more informative than d1
or d2.
You can create a program that performs this join by dropping a TQuery,
TDataSource, and TDBGrid on a form. Wire the objects together,
wire the TQuery to the BCDEMOS database, and set its SQL
property to the query shown previously. A sample program called FourWayJoin demonstrates
this process. The output from the program is shown in Figure 12.15.
If you are not familiar with this kind of join, you might want to bring up the
Relate and FourWayJoin tables side by side and compare them. Look,
for instance, at the Action Club entries in the FourWayJoin program and trace them
through so that you see how they correspond to the entries in the Relate program.
Both programs describe an identical set of relationships; they just show the outcome
in a different manner.
Notice that the AmountPaid column in the FourWayJoin program has the
same number repeated twice in the Action Club section, as shown in Figure 12.15.
In particular, the numbers $1,004.80 and $20,108 both appear twice. This is because
there are two different items associated with these orders, as you can tell from
glancing at the Parts table in the Relate program.
FIGURE
12.15. The FourWayJoin program demonstrates
a join between four tables.
NOTE: Unless you are already familiar
with this material, be sure to run the FourWayJoin and Relate programs and switch
back and forth between them until you understand why the FourWayJoin program works
as it does. I find it easy to understand the Relate program at a glance, but the
FourWayJoin program is a bit more subtle.
Joins and QBE
The FourWayJoin program is a good advertisement for the power of SQL. Once you
had the SQL statement composed, it was simple to put the program together. All the
work is embodied in just a few lines of code, and everything else was trivial to
construct. SQL can help concentrate the intelligence of a program in one small area--or
at least it does in this one example.
The sticking point, of course, is that not everyone is a whiz at composing SQL
statements. Even if you understand SQL thoroughly, it can still be confusing to try
to string together all those interrelated Select, Order By, From,
and Where clauses. What is needed here is a way to automate this process.
Most of the versions of C++Builder ship with a very useful tool that makes it
easy to compose even relatively complex SQL statements. In particular, I'm talking
about the QBE tool in the Database Desktop. If you want, you can use the Query Builder
instead, or some other third-party tool that you might favor. However, in this section
of the book, I will concentrate on the QBE tool, because it will be available to
nearly all readers of this book. (QBE is also built into Paradox. Furthermore, there
are some third-party QBE components on the market. The Query Builder only ships with
the client/server version of C++Builder or Delphi.)
Start the DBD and set the Working Directory to the BCDEMOS alias. Choose
File | New | QBE Query from the menu. A dialog will appear listing the tables in
the BCDEMOS database. Select the Customer table. Reopen the Select
File dialog by clicking the Add Table icon in the Toolbar. You can find the Add Table
icon by holding the mouse over each icon until the fly-by help comes up or until
you see the hint on the status bar. You can also simply look for the icon with the
plus sign on it. Continue until you have added the Customer, Orders,
Items, and Parts tables to the query. You can multiselect from
inside the FileOpenDialog. Resize the query window until all four tables
are visible, as shown in Figure 12.16.
FIGURE
12.16. Four tables used in a single QBE
example.
To join these tables together, select the Join Tables icon, located just to the
right of the lightning bolt. Click once on the Join Tables icon, and then click the
CustNo fields for the Customer and Orders tables. The
symbol "join1" will appear in each field. Click the Join Tables
icon again, and link the Orders and Items tables on the OrderNo
field. Join the Parts and Items tables on the PartNo field.
After joining the tables, select the fields you want to show by clicking once
in the check box associated with the fields you want to view. When you are done,
the result should look like Figure 12.17.
FIGURE
12.17. The complete QBE query for joining
the Customer, Orders, Items, and Parts tables.
To test your work, click the lightning bolt icon once. You should get a table
that looks just like the one in the FourWayJoin program. You will find a copy of
this QBE query in the Chap12 directory on the CD-ROM that accompanies this
book.
To translate the QBE statement into SQL, first close the result table so you can
view the query shown in Figure 12.17. Click once on the SQL icon to perform the translation.
You can save this SQL to disk, or just block-copy it and deposit it in the SQL
property of a TQuery object.
On paper, this process takes a few minutes to explain. However, once you understand
the QBE tool, you can use it to relate multiple tables in just a very few seconds.
For most people, QBE is probably the simplest and fastest way to compose all your
SQL Select statements. Don't neglect learning to use this tool. It's a simple,
easy-to-use tool that can save you hours of time.
NOTE: The only peculiarity of the QBE
tool is that by default it saves its output in a text-based language called QBE,
rather than in SQL. However, once you press the SQL button, it converts the QBE code
to SQL, thereby rendering the exact same results produced by standard SQL query builders.
Once again, the great advantage of the QBE tool over other SQL tools is that it ships
with the DBD product that accompanies nearly all versions of C++Builder. If you have
access to a more powerful SQL builder, you might want to use it instead of the QBE
tool. However, QBE works fine in most circumstances, even when running against SQL
data in an InterBase table.
What I have said here, is, of course, heresy to many members of the hard-core client
server crowd. They tend to have a natural aversion to QBE, just as C++ and Object
Pascal programmers shy away from BASIC. However, QBE ships for free with both Paradox
and the Database Desktop, and it will meet the needs of ninety percent, but not all,
of the programmers out there. So it's worth a look, yes?
That's it for the discussion of the basic principles of relational databases.
You've seen how to build master-detail relationships, and how to construct joins.
More importantly, you've seen how C++Builder encapsulates these key aspects of relational
database design. There is, of course, much more to the theory of relational databases.
There are whole books on this subject, particularly on the best way to design relational
databases.
Which Database Should I Use?
If you are not sure of which database to use, I would tentatively suggest using
Paradox to get started. It has a robust set of rules for enforcing data integrity,
a rich set of types, and some nice features such as autoincrement fields. It works
fine on a network, as long as everyone can attach their PCs to one centralized server
and you aren't expecting a large number of simultaneous users.
If you are expecting 30 or more simultaneous users, I would bite the bullet financially
and switch to InterBase or to another standard SQL server such as Oracle, Sybase,
or MS SQL Server. You could have a hundred or even two hundred users hitting a Paradox
table at the same time, but I wouldn't recommend it. If you have a hundred users,
but only ten or fifteen are likely to be after a table at one time, I would still
feel comfortable with Paradox, though I would start leaning in the direction of a
real client/server database.
Client/server databases such as InterBase will
- Let you talk to the server over TCP/IP, or some other network protocol
- Put much of the computational burden on the server rather than the client machine
- Prove to be much more robust
- Enable you to use stored procedures, triggers, views, and other advanced server-side
technologies
Remember that when I make suggestions about databases or about anything else,
I am usually not so much trying to establish a definitive standard as I am trying
to give reasonable advice to those readers who are not sure which way to turn.
Summary
My suggestion at this point is to dig into relational databases and learn as much
about them as you can. Raw data sitting on a disk is boring. Rows of data in a grid
are boring. Relational databases, however, are innately interesting. This is the
fun part of database programming. Play around with indexes, or play around with joins
and one-to-many relationships. The name of the game here is to find ways to arrange
data in relational tables so that you can get at it easily. When you arrange data
correctly, it's amazing to see how quickly you can locate very obscure pieces of
information. In fact, a number of very fun games, such as Civilization or the Ultima
series, rely heavily on databases in order to further the game play. Take some time
to dig into this stuff. It's more interesting than you might think.
If you are wishing that I had spent more time on InterBase tables, don't worry,
because I cover that topic heavily later in the book. Much of the material covered
in this chapter will be reviewed again, in much shorter form, in the light of the
InterBase server.
|